knitr::opts_chunk$set(echo = TRUE)
# Load libraries for homework problems
library(tidyverse)
library(gt)
library(patchwork)
# Read in COVID-19 data
# R/make_data.R creates this file
cv19 <- read_csv('data/usa_covid19.csv')
The COVID-19 pandemic is an ongoing public health emergency in the United States (US) and worldwide. Since 2020-01-21, the New York times has monitored and shared COVID-19 data (see github repo here) from across the US at the state and county level.
I have modified the New York times data to include information about state’s population levels. The data are described below:
c("date" = "Date",
"state" = "State in the US",
"cases_total" = "Total number of cases as of date",
"deaths_total" = "Total number of deaths as of date",
"pop_2015" = "Estimated population as of 2015"
) %>%
enframe() %>%
gt(rowname_col = "name") %>%
tab_stubhead(label = 'Variable name') %>%
cols_label(value = 'Variable description') %>%
cols_align('right') %>%
tab_footnote(locations = cells_body(rows = 5, columns = 2),
footnote = "Source: usmap::countypop") %>%
tab_footnote(locations = cells_body(columns = 2, rows = 2),
footnote = 'US = United States') %>%
tab_header(title = 'Dictionary for New York Times COVID-19 data',
subtitle = paste("Last updated:", max(cv19$date)))
| Dictionary for New York Times COVID-19 data | |
|---|---|
| Last updated: 2020-04-03 | |
| Variable name | Variable description |
| date | Date |
| state | State in the US1 |
| cases_total | Total number of cases as of date |
| deaths_total | Total number of deaths as of date |
| pop_2015 | Estimated population as of 20152 |
|
1
US = United States
2
Source: usmap::countypop
|
|
The data (cv19) are printed below:
cv19
Create two new columns in cv19:
cases_new the number of new cases identified on a given day for a given state.
deaths_new the number of new deaths confirmed on a given day for a given state.
Notes:
the lag() function is helpful for this.
Your solution should look like this
read_rds('solutions/01_solution.rds')
Compute the total number of new cases identified and deaths confirmed each day in the USA on or after March 1st, 2020. Your summarized data should look like this:
read_rds('solutions/02_solution.rds')
Using the data created in problem 2, create two bar plots showing the number of new cases identified and deaths confirmed in the USA after March 1st, 2020.
Notes This is a great chance to learn about the patchwork R package.
Your solution should look like this
read_rds('solutions/03_solution.rds')
Add four new columns to the data you created in problem 1:
cases_per100k: Number of cases per 100,000 citizensdeaths_per100k: Number of deaths per 100,000 citizenscases_dbl_days: Number of days until case count doubles, based on current day’s case countdeaths_dbl_days: Number of days until death count doubles, based on current day’s death count.Challenge yourself:
filter the data you have created in this problem to contain only the most recent day.
Identify the 10 states that have the highest death rate per 100,000 citizens.
Tabulate the total number, rate, and days to double for cases and deaths in each of these 10 states.
Your solution should look like this:
read_rds('solutions/04_solution.rds')
| Ten states in the US with highest death rates due to COVID-19 | ||||||
|---|---|---|---|---|---|---|
| Data presented for: 2020-04-03 | ||||||
| Cases | Deaths | |||||
| Total count | Rate per 100k | No. days to double | Total count | Rate per 100k | No. days to double | |
| New York | 102,870 | 519.7 | 9.2 | 2,935 | 14.8 | 9.4 |
| Louisiana | 10,297 | 220.5 | 8.0 | 370 | 7.9 | 5.2 |
| New Jersey | 29,895 | 333.7 | 5.9 | 647 | 7.2 | 5.0 |
| Michigan | 12,670 | 127.7 | 5.7 | 478 | 4.8 | 6.8 |
| Washington | 6,966 | 97.2 | 17.3 | 293 | 4.1 | 13.0 |
| Connecticut | 4,915 | 136.9 | 3.5 | 132 | 3.7 | 5.6 |
| Massachusetts | 10,402 | 153.1 | 6.2 | 192 | 2.8 | 4.1 |
| Vermont | 389 | 62.1 | 6.6 | 17 | 2.7 | Inf |
| District of Columbia | 757 | 112.6 | 6.3 | 15 | 2.2 | 4.0 |
| Colorado | 4,182 | 76.6 | 8.2 | 110 | 2.0 | 6.9 |
Learn something new: take a look at a famous flipbook created by Gina Reynolds. The cv19 data have a very similar structure to that of the flipbook in Gina’s talk. Learn about the ggplot2 tools that are used in the flipbook and try to adapt them to create the ‘racing bar chart’ below.